An Aho-Corasick Based Assessment of Algorithms Generating Failure Deterministic Finite Automata

نویسندگان

  • Madoda Nxumalo
  • Derrick G. Kourie
  • Loek G. Cleophas
  • Bruce W. Watson
چکیده

The Aho-Corasick algorithm derives a failure deterministic finite automaton for finding matches of a finite set of keywords in a text. It has the minimum number of transitions needed for this task. The DFA-Homomorphic Algorithm (DHA) algorithm is more general, deriving from an arbitrary complete deterministic finite automaton a language-equivalent failure deterministic finite automaton. DHA takes formal concepts of a lattice as input. This lattice is built from a state/outtransition formal context that is derived from the complete deterministic finite automaton. In this paper, three general variants of the abstract DHA are benchmarked against the specialised Aho-Corasick algorithm. It is shown that when heuristics for these variants are suitably chosen, the minimality attained by the Aho-Corasick algorithm can be closely approximated. A published non-lattice-based algorithm is also shown to perform well in experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Flexible Pattern-Matching Algorithm for Network Intrusion Detection Systems Using Multi-Core Processors

As part of network security processes, network intrusion detection systems (NIDSs) determine whether incoming packets contain malicious patterns. Pattern matching, the key NIDS component, consumes large amounts of execution time. One of several trends involving general-purpose processors (GPPs) is their use in software-based NIDSs. In this paper, we describe our proposal for an efficient and fl...

متن کامل

A missing link in root-to-frontier tree pattern matching

Tree pattern matching (tpm) algorithms play an important role in practical applications such as compilers and XML document validation. Many tpm algorithms based on tree automata have appeared in the literature. For reasons of efficiency, these automata are preferably deterministic. Deterministic root-to-frontier tree automata (drftas) are less powerful than nondeterministic ones, and no root-to...

متن کامل

An Efficient Linear Pseudo-minimization Algorithm for Aho-Corasick Automata

A classical construction of Aho and Corasick solves the pattern matching problem for a finite set of words X in linear time, where the size of the input X is the sum of the lengths of its elements. It produces an automaton that recognizes A∗X, where A is a finite alphabet, but which is generally not minimal. As an alternative to classical minimization algorithms, which yields a O(n logn) soluti...

متن کامل

Multiple Pattern String Matching Methodologies: A Comparative Analysis

String matching algorithms in software applications like virus scanners (anti-virus) or intrusion detection systems is stressed for improving data security over the internet. String-matching techniques are used for sequence analysis, gene finding, evolutionary biology studies and analysis of protein expression. Other fields such as Music Technology, Computational Linguistics, Artificial Intelli...

متن کامل

EffCLiP: Efficient Coupled-Linear Packing for Finite Automata

Finite-automata are widely-recognized as a fundamental computing model with a broad range of applications, notably network monitoring. We propose a new approach, “efficient coupled-linear packing” (EffCLiP), that optimizes both finite-automata size and performance. EffCLiP employs a novel transition representation that enables a simple addressing operator (integer addition) while providing flex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015